How our business compares to other businesses (over time and in February 2020) How the components of the net promoter score for our business have changed over time
In February 2020, our business was performing well, with only three other businesses boasting higher NPS scores. Also, comparing our business’s promoter rating over time, we also had the highest number of promoter in February 2020.
Previously, in 2019, only competitor A showed a slightly negative trend(but it still remains the highest rank) whereas our company and competitor B,C and D showed no change or slight increase. After December 2019, our business consistently held the fourth position and demonstrated a slight positive change over time.
While passive responses dominated initially, they decreased toward the end. Both detractor and promoter responses exhibited growth, with promoter responses becoming the highest among all over time.
Q2. Create a graphic that shows how many respondents there were in each country.
Present the countries in an order that is more interesting than alphabetical. Were they roughly equal or were there notable difference among the countries?
Canada notably had the most number of respondents, followed by the US and Great Britain. It seemed like English speaking countries in general has more respondents than non-English speaking countries
import pandas as pdimport altair as altq2_url ="https://calvin-data304.netlify.app/data/wvs.csv"q2 = pd.read_csv(q2_url)q2_count = q2.groupby("country").size().reset_index(name='count')q2_count.head
<bound method NDFrame.head of country count
0 AUS 1773
1 CAN 4018
2 DEU 1520
3 GBR 2399
4 KOR 1245
5 MEX 1729
6 NLD 1891
7 SWE 1194
8 USA 2552>
# Graph for q2alt.Chart(q2_count).mark_bar().encode( x = alt.X("count:Q", title ="respondents"), y = alt.Y("country:N").sort("-x"),)
Q3
There are three age-related variables: age, age3, and age6. The latter two put respondents into 3 and 6 age groups respectively. Create graphics that let you see what the age groupings are and check whether these are the same across all the countries.
import altair as alt# Create a stacked bar chart for the 'age3' variablechart_age3 = alt.Chart(q2).mark_point(filled =True, size =40).encode( x=alt.X('age:Q', title='age'), y=alt.Y('age3:N', title='age3', axis=alt.Axis(values=list(range(0, 100, 5)))), color ="country:N").properties( width=400, height=300, title='Distribution of Age Groups (age3)')# Create a stacked bar chart for the 'age6' variablechart_age6 = alt.Chart(q2).mark_point(filled=True).encode( x=alt.X('age:Q', title='Age'), y=alt.Y('age6:N', title='Age6'), color ="country:N").properties( width=400, height=300, title='Distribution of Age Groups (age6)')# Arrange the charts horizontallychart = alt.hconcat(chart_age3, chart_age6 )chart
MaxRowsError: The number of rows in your dataset is greater than the maximum allowed (5000).
See https://altair-viz.github.io/user_guide/large_datasets.html for information on how to plot large datasets, including how to install third-party data management tools and, in the right circumstance, disable the restriction
alt.HConcatChart(...)
As most of the colors are overlapped, we can see that the binning is similar across all the countries.
Using individual ages (age) leads to a cluttered and complex graph, making it hard, though possible, to discern trends. Also, interpreting individual ages can be challenging, as there are no natural groupings to help understanding. Lastly, using individual ages lacks meaningful aggregation, which can obscure important trends and insights.
Reflection: The line of the loess graph is similar to the binned age, but sice it’s based on a scatter plot, the curves are smooth and does a better job in capturing the underlying trend.